fix(bgz-tensor): resolve 5 CI-invisible test failures (3 stale tests + 1 slice-bounds bug)#506
Conversation
…+ 1 slice-bounds bug) bgz-tensor is excluded from CI (no workflow tests its manifest), so these reds accumulated unseen on main — surfaced by a local sweep of the CI-uncovered standalone crates. Diagnosed each as test-stale vs code-bug before fixing; did not blanket-bump assertions. Test-stale (code was correct, expected values drifted): - gamma_calibration: byte_size() is 48 (RoleGamma 36 + CosineGamma 12), not 40 — the struct has 8 roles (32B)+phi_scale(4B)=36B; the stale '28+12' comment assumed 7 roles. - hhtl_cache: serialize() writes a 16-byte gamma_meta trailer that the test's size formula omitted; added + 16. - hhtl_d: 0x3C00 is the IEEE-half (F16) bit pattern for 1.0; real BF16 1.0 is 0x3F80. bf16_to_f32 was correct; the test literal was wrong. Code-bug (genuine robustness fix, both encode + decode): - matryoshka encode_row/decode_row panicked when the SVD basis rank is lower than the band profile's nominal max (fewer sample rows than requested components → a band extends past the available coeffs). encode: clamp slice start (was only clamping end); decode: cap the per-band component count to the coeff buffer. Symmetric, so the byte stream stays in sync; roundtrip + quality assertions pass. Verified: bgz-tensor 200 passed / 0 failed (was 195/5); fmt clean; clippy -p bgz-tensor --all-targets -D warnings clean.
|
Warning Review limit reached
More reviews will be available in 15 minutes and 2 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Why
bgz-tensoris excluded from CI — no workflow tests its manifest (build.yml/rust-test.yml/style.ymlonly coverlance-graph,lance-graph-contract,deepnsm,jc). As a resultcargo test -p bgz-tensorwas 195 passed / 5 failed onmain, and had been for an unknown number of PRs. Surfaced by a local sweep of the CI-uncovered standalone crates (the same gap that hid thecausal-edgered fixed in #498).Each red was diagnosed as test-stale vs code-bug before fixing — no assertions were blanket-bumped to green (which would have papered over the real defect).
The 5 failures
Test-stale (the code was correct; the test's expected value had drifted):
gamma_calibration::calibration_profile_size40byte_size()is genuinely 48 =RoleGamma36 B ([f32;8]=32 +phi_scale=4) +CosineGamma12 B. The stale// 28 + 12comment assumed 7 roles; the struct has 8.hhtl_cache::test_hhtl_cache_256_sizeserialize()writes a 16-bytegamma_metatrailer ([lo_gamma, hi_gamma, phi_scale, role_id]) that the test's byte formula omitted. Added+ 16.hhtl_d::hhtl_d_entry_roundtrip0x3C00"BF16 1.0"0x3C00is the IEEE-half (F16) bit pattern for 1.0 (decodes to 0.0078125 under BF16). Real BF16 1.0 is0x3F80.bf16_to_f32(top-16-bits-of-f32) was correct; the literal was wrong.Code bug (a genuine robustness defect, not a stale test):
matryoshka::{encode_decode_roundtrip_nonzero, roundtrip_quality_reasonable}panicked with an out-of-bounds slice when the SVD basis rank is lower than the band profile's nominal max — e.g. buildingSvdBasis::build(rows, 128)from only 50 sample rows yields a rank-50 basis, butBandProfile::standard(128, …)has bands starting at 64/192.encode_rowslicedcoeffs[band.start..]with an unguardedstart;decode_rowwrotecoeffs[band.start + i]past the buffer.start(it was only clampingend); decode caps the per-band component count to the coeff buffer (n.min(coeffs.len().saturating_sub(band.start))). A band entirely beyond the available rank now degrades to empty on both sides, so the byte stream stays in sync — the roundtrip + quality assertions (cosine > 0.8,len == 256) pass.This one was a latent crash, not merely a stale test: any production use with a rank-deficient basis (fewer sample rows than requested components) would panic.
Verification (local — this sandbox has
protoc+ full toolchain)Diff is 9 insertions / 6 deletions across 4 files — the three one-line test corrections (each with a corrected explanatory comment) plus the two-site
matryoshkaguard.Follow-up (not in this PR)
The root cause is structural: CI tests only 4 of ~30 crates. Two CI-invisible reds have now been found and fixed this session (
causal-edge#498,bgz-tensorhere). A separate PR should add the disk-cheap standalone crates (bgz-tensor,bgz17,causal-edge,helix,highheelbgz) torust-test.ymlso this class of regression is caught at PR time. Flagging for the workflow owner rather than bundling a CI change into a test-fix PR.Generated by Claude Code